
The Wandering Angel http://www.flickr.com/photos/wandering_angel/1467802750/ - CC-BY
The Wandering Angel http://www.flickr.com/photos/wandering_angel/1467802750/ - CC-BY
Wherein we discover the gateways to dynamic processes on a server.
Homework Review and Questions
What if you want to pass information to that script?
How can you give the script access to information about the HTTP request itself?
A computer has an environment:
in *nix, you can see this in a shell:
$ printenv
TERM_PROGRAM=iTerm.app
...
or in Windows at the command prompt:
C:\> set
ALLUSERSPROFILE=C:\ProgramData
...
or in PowerShell:
PS C:\> Get-ChildItem Env:
ALLUSERSPROFILE C:\ProgramData
...
In a bash
shell we can do this:
$ export VARIABLE='some value'
$ echo $VARIABLE
some value
or at a Windows command prompt:
C:\Users\Administrator\> set VARIABLE='some value'
C:\Users\Administrator\> echo %VARIABLE%
'some value'
or in PowerShell:
PS C:\> $env:VARIABLE = "some value"
PS C:\> Get-ChildItem Env:VARIABLE
'some value'
These new values are now part of the environment
*nix:
$ printenv
...
VARIABLE=some value
Windows:
C:\> set
...
VARIABLE='some value'
PowerShell:
PS C:\> Get-ChildItem Env:
...
VARIABLE 'some value'
We can see this environment in Python, too:
$ python
>>> import os
>>> print(os.environ['VARIABLE'])
some_value
>>> print(os.environ.keys())
['VERSIONER_PYTHON_PREFER_32_BIT', 'VARIABLE',
'LOGNAME', 'USER', 'PATH', ...]
You can alter os environment values while in Python:
>>> os.environ['VARIABLE'] = 'new_value'
>>> print(os.environ['VARIABLE'])
new_value
But that doesn't change the original value, outside Python:
>>> ^D
$ echo this is the value: $VARIABLE
this is the value: some_value
<OR>
C:\> \Users\Administrator\> echo %VARIABLE%
'some value'
subprocess.Popen(args, bufsize=0, executable=None,
stdin=None, stdout=None, stderr=None,
preexec_fn=None, close_fds=False,
shell=False, cwd=None, env=None, # <-------
universal_newlines=False, startupinfo=None,
creationflags=0)
CGI is little more than a set of standard environmental variables
First discussed in 1993, formalized in 1997, the current version (1.1) has been in place since 2004.
From the preamble:
This memo provides information for the Internet community. It does not
specify an Internet standard of any kind.
-- RFC 3875 - CGI Version 1.1: http://tools.ietf.org/html/rfc3875
4. The CGI Request . . . . . . . . . . . . . . . . . . . . . . . 10
4.1. Request Meta-Variables . . . . . . . . . . . . . . . . . 10
4.1.1. AUTH_TYPE. . . . . . . . . . . . . . . . . . . . 11
4.1.2. CONTENT_LENGTH . . . . . . . . . . . . . . . . . 12
4.1.3. CONTENT_TYPE . . . . . . . . . . . . . . . . . . 12
4.1.4. GATEWAY_INTERFACE. . . . . . . . . . . . . . . . 13
4.1.5. PATH_INFO. . . . . . . . . . . . . . . . . . . . 13
4.1.6. PATH_TRANSLATED. . . . . . . . . . . . . . . . . 14
4.1.7. QUERY_STRING . . . . . . . . . . . . . . . . . . 15
4.1.8. REMOTE_ADDR. . . . . . . . . . . . . . . . . . . 15
4.1.9. REMOTE_HOST. . . . . . . . . . . . . . . . . . . 16
4.1.10. REMOTE_IDENT . . . . . . . . . . . . . . . . . . 16
4.1.11. REMOTE_USER. . . . . . . . . . . . . . . . . . . 16
4.1.12. REQUEST_METHOD . . . . . . . . . . . . . . . . . 17
4.1.13. SCRIPT_NAME. . . . . . . . . . . . . . . . . . . 17
4.1.14. SERVER_NAME. . . . . . . . . . . . . . . . . . . 17
4.1.15. SERVER_PORT. . . . . . . . . . . . . . . . . . . 18
4.1.16. SERVER_PROTOCOL. . . . . . . . . . . . . . . . . 18
4.1.17. SERVER_SOFTWARE. . . . . . . . . . . . . . . . . 19
You have a couple of options:
Let's keep it simple by using the Python module
In the class resources for this session, you'll find a directory named cgi
.
Make a copy of that folder in your class working directory.
Windows Users, you may have to edit the first line of
cgi/cgi-bin/cgi_1.py
to point to your python executable.
cgi
directorypython -m http.server --cgi
http://localhost:8000/
cgi-bin
and
cgi_1.py
cgi-bin
directory needs to be readable
and executable.Remember that you can use the bash chmod
command to change permissions
in *nix: chmod a+x cgi-bin/cgi_1.py
Windows users, use the 'properties' context menu to get to permissions, just grant 'full'
Problems with permissions can lead to failure. So can scripting errors
cgi/cgi-bin/cgi_1.py
in an editorcgi.test()
, add a single line:1 / 0
Reload your browser, what happens now?
CGI is famously difficult to debug. There are reasons for this:
Back in your editor, add the following lines, just below import cgi
:
import cgitb
cgitb.enable()
Now, reload again.
Let's fix the error from our traceback. Edit your cgi_1.py
file to match:
#!/usr/bin/env python
import cgi
import cgitb
cgitb.enable()
cgi.test()
Notice the first line of that script: #!/usr/bin/env python
.
This is called a shebang (short for hash-bang)
It tells the system what executable program to use when running the script.
Servers like http.server --cgi
run CGI scripts as a system user called
nobody
.
This is just like you calling:
$ ./cgi_bin/cgi_1.py
In fact try that now in your second terminal (use the real path), what do you get?
Windows folks, you may need C:\>python cgi-bin/cgi_1.py
Notice what is missing?
There are a couple of important facts about CGI that derive from this:
CGI is largely a set of agreed-upon environmental variables.
We've seen how environmental variables are found in python in
os.environ
We've also seen that at least some of the variables in CGI are not part of the system environment.
Where do they come from?
Let's find 'em. In a terminal fire up python:
In [1]: from http import server
In [2]: server.__file__
Out[2]: '/Users/cewing/pythons/parts/opt/lib/python3.5/http/server.py'
In [3]: !subl '/Users/cewing/pythons/parts/opt/lib/python3.5/http/server.py'
If you don't have the subl
command, or another one that starts your
editor, copy this path and open it in your text editor.
From http/server.py
, in the CGIHTTPRequestHandler
class, in the
run_cgi
method:
env = copy.deepcopy(os.environ)
env['SERVER_SOFTWARE'] = self.version_string()
env['SERVER_NAME'] = self.server.server_name
env['GATEWAY_INTERFACE'] = 'CGI/1.1'
...
if self.have_fork:
# Unix -- fork as we should
...
pid = os.fork()
...
try:
...
os.execve(scriptfile, args, env)
...
else:
# Non-Unix -- use subprocess
import subprocess
...
p = subprocess.Popen(cmdline,
...
env = env
)
...
And that's it, the big secret. The server takes care of setting up the environment so it has what is needed.
Now, in reverse. How does the information that a script creates end up in your browser?
A CGI Script must print its results to stdout.
Use the same method as above to import and open the source file for the
cgi
module. Note what test
does for an example of this.
def test(environ=os.environ):
...
print("Content-type: text/html")
print()
try:
form = FieldStorage() # Replace with other classes to test those
print_directory()
print_arguments()
print_form(form)
...
except:
print_exception()
What the Server Does:
HTTP/1.1 200 OK\r\n
first line to the clientWhat the Script Does:
You've seen the output from the cgi.test()
method from the cgi
module.
Let's make our own version of this.
cgi-bin
you will find the file cgi_2.py
.http://localhost:8000/
and clicking on Exercise OneGO
All this is well and good, but where's the dynamic stuff?
It'd be nice if a user could pass form data to our script for it to use.
In HTTP, data is often passed to the server as a part of a URL called the query string
The URL query string is formatted as name=value
pairs, separated by the
ampersand (&
) character
The entire query string is separated from other parts of the URL by a question mark:
http://localhost:8000/cgi_bin/somescript.py?a=23&b=46&b=92
In the cgi
module, we get access to the query string with the
FieldStorage
class:
import cgi
form = cgi.FieldStorage()
stringval = form.getvalue('a', None)
listval = form.getlist('b')
FieldStorage
are always stringsgetvalue
allows you to return a default, in case the field isn't presentgetlist
always returns a list: empty, one-valued, or as many values as
are presentLet's create a dynamic adding machine.
cgi-bin
directory you'll find cgi_sums.py
.index.html
file in the cgi
directory, the third link leads to
this file.cgi.FieldStorage
.cgi_sums.py
so that the result of adding all
operands sent via the url query is returned.Content-Type
header.form = cgi.FieldStorage()
operands = form.getlist('operand')
msg = "your total is {total}"
try:
total = sum(map(int, operands))
msg = msg.format(total=total)
except (ValueError, TypeError):
msg = "Unable to calculate a sum, please provide integer operands"
print("Content-Type: text/plain")
print("Content-Length: %s" % len(msg))
print()
print(msg)
Let's take a break here, before continuing
The Web Server Gateway Interface
CGI is great, but there are problems:
How do we overcome this problem?
The most popular approach is to have a long-running process inside the server that handles CGI scripts.
FastCGI and SCGI are existing implementations of CGI in this fashion.
The PHP scripting language works in much the same way.
The Apache module mod_python offers a similar capability for Python code.
This makes it much more difficult to share resources
Enter WSGI, the Web Server Gateway Interface.
Other alternatives are specific implementations of the CGI standard.
WSGI is itself a new standard, not an implementation.
WSGI is generalized to describe a set of interactions.
Developers can write WSGI-capable apps and deploy them on any WSGI server.
Read the original WSGI spec: http://www.python.org/dev/peps/pep-0333
There is also an update for Python 3:
https://www.python.org/dev/peps/pep-3333
WSGI consists of two parts, a server and an application.
A WSGI Server must:
start_response(status, headers, exc_info=None)
environment
and start_response
as argsA WSGI Appliction must:
start_response
callable as argumentsstart_response
method.from some_application import simple_app
def build_env(request):
# put together some environment info from the reqeuest
return env
def handle_request(request, app):
environ = build_env(request)
iterable = app(environ, start_response)
for data in iterable:
# send data to client here
def start_response(status, headers):
# start an HTTP response, sending status and headers
# listen for HTTP requests and pass on to handle_request()
serve(simple_app)
Where the simplified server above is not functional, this is a complete app:
def application(environ, start_response)
status = "200 OK"
body = "Hello World\n"
response_headers = [('Content-type', 'text/plain'),
('Content-length', len(body))]
start_response(status, response_headers)
return [body]
A third part of the puzzle is something called WSGI middleware
WSGI Servers:
HTTP <---> WSGI
WSGI Applications:
WSGI <---> app code
The WSGI Stack can thus be expressed like so:
HTTP <---> WSGI <---> app code
The Python standard lib provides a reference implementation of WSGI:
You can also deploy with Apache as your HTTP server, using mod_wsgi:
Finally, it is also common to see WSGI apps deployed via a proxied WSGI server:
Seem Familiar?
Let's start simply. We'll begin by repeating our first CGI exercise in WSGI
wsgi
directory in the class resources. Copy it to your working
directory.wsgi_1.py
in your text editor.environ
,
just as we use os.environ
in cgiBut First
if __name__ == '__main__':
from wsgiref.simple_server import make_server
srv = make_server('localhost', 8080, application)
srv.serve_forever()
Note that we pass our application
function to the server factory
We don't have to write a server, wsgiref
does that for us.
In fact, you should never have to write a WSGI server.
def application(environ, start_response):
response_body = body % (
environ.get('SERVER_NAME', 'Unset'), # server name
...
)
status = '200 OK'
response_headers = [('Content-Type', 'text/html'),
('Content-Length', str(len(response_body)))]
start_response(status, response_headers)
return [response_body.encode('utf8')]
We do not define start_response
, the application does that.
We are responsible for determining the HTTP status.
And the content we hand back must be bytes
, not unicode.
You can run this script with python:
$ python wsgi_1.py
This will start a wsgi server. What host and port will it use?
Point your browser at http://localhost:8080/
. Did it work?
Go ahead and fill in the missing bits. Use the environ
passed into
application
WSGI is a long-running process.
The file you are editing is not reloaded after you edit it.
You'll need to quit and re-run the script between edits.
Notice the use of pprint.pprint
, check your terminal for useful output.
So now we've learned a bit about the WSGI specification and how a WSGI application can get data that comes in via an HTTP request.
Let's create a multi-page wsgi application.
It will serve a small database of python books.
The database (with a very simple api) can be found in wsgi/bookdb.py
When viewing our first wsgi app, do we see the name of the wsgi application script anywhere in the URL?
In our wsgi application script, how many applications did we actually have?
How are we going to serve different types of information out of a single application?
We have to write an app that will map our incoming request path to some code that can handle that request.
This process is called dispatch
. There are many possible approaches.
Let's begin by designing this piece of our app.
Open bookapp.py
from the wsgi
folder. We'll do our work here.
The wsgi environment gives us access to PATH_INFO.
This value is the URI from the client's HTTP request.
We can design the URLs that our app will use to assist us in routing.
Let's declare that any request for /
will map to the list page.
We can also say that the URL for a book will look like this:
http://localhost:8080/book/<identifier>
resolve_path
Let's write a function, called resolve_path
in our application file.
def resolve_path(path):
urls = [(r'^$', books),
(r'^book/(id[\d]+)$', book)]
matchpath = path.lstrip('/')
for regexp, func in urls:
match = re.match(regexp, matchpath)
if match is None:
continue
args = match.groups([])
return func, args
# we get here if no url matches
raise NameError
We need to hook our new dispatch function into the application.
environ
.def application(environ, start_response):
headers = [("Content-type", "text/html")]
try:
path = environ.get('PATH_INFO', None)
if path is None:
raise NameError
func, args = resolve_path(path)
body = func(*args)
status = "200 OK"
except NameError:
status = "404 Not Found"
body = "<h1>Not Found</h1>"
except Exception:
status = "500 Internal Server Error"
body = "<h1>Internal Server Error</h1>"
finally:
headers.append(('Content-length', str(len(body))))
start_response(status, headers)
return [body.encode('utf8')]
Once you've got your script settled, run it:
$ python bookapp.py
Then point your browser at http://localhost:8080/
http://localhost/book/id3
http://localhost/book/id73/
http://localhost/sponge/damp
Did that all work as you would have expected?
The function books
should return an html list of book titles where each
title is a link to the detail page for that book
/book/<id>
def books():
all_books = DB.titles()
body = ['<h1>My Bookshelf</h1>', '<ul>']
item_template = '<li><a href="/book/{id}">{title}</a></li>'
for book in all_books:
body.append(item_template.format(**book))
body.append('</ul>')
return '\n'.join(body)
Quit and then restart your application script:
$ python bookapp.py
Then reload the root of your application:
http://localhost:8080/
You should see a nice list of the books in the database. Do you?
Click on a link to view the detail page. Does it load without error?
The next step of course is to polish up those detail pages.
In this last case, what's the right HTTP response code to send?
def book(book_id):
page = """
<h1>{title}</h1>
<table>
<tr><th>Author</th><td>{author}</td></tr>
<tr><th>Publisher</th><td>{publisher}</td></tr>
<tr><th>ISBN</th><td>{isbn}</td></tr>
</table>
<a href="/">Back to the list</a>
"""
book = DB.title_info(book_id)
if book is None:
raise NameError
return page.format(**book)
Quit and restart your script one more time
Then poke around at your application and see the good you've made
And your application is portable and sharable
It should run equally well under any wsgi server
Next steps for an app like this might be:
For your homework this week, you'll be creating a wsgi application of your own.
You'll create an online calculator that can perform several operations
You'll need to support:
Your users should be able to send appropriate requests and get back proper responses:
http://localhost:8080/multiply/3/5 => 15
http://localhost:8080/add/23/42 => 65
http://localhost:8080/divide/6/0 => HTTP "400 Bad Request"
To submit your homework:
wsgi-calc
.calculator.py
.$ python calculator.py
Your repository should include a README.md file.
Include all instructions I need to successfully run and view your script.
When you are done, send Maria and I an email with a link to your repository.
Next week we will be installing Python packages that are not part of the standard library.
This is a common occurence in web development. But it can be hazardous.
In order to practice safe development I am going to ask you to read and follow through a brief tutorial I've created on the subject.
If you have any trouble, or if things do not work the way they are supposed to, please reach out. We will need this to be working next week.
For educational purposes, you might wish to take a look at the source code for
the wsgiref
module. It's the canonical example of a simple wsgi server
>>> import wsgiref
>>> wsgiref.__file__
'/full/path/to/your/copy/of/wsgiref.py'
...
See you Next Time